Data prefetching for linear algebra operations on high performance workstations

نویسندگان

  • Elena Garcia
  • Jose R. Herrero
  • Juan J. Navarro
چکیده

In a previous work it was shown that the performance of linear algebra computations , which access large amounts of data, is dependent on the behavior of the memory hierarchy. This research is aimed to use the multilevel orthogonal blocking approach in conjuntion with other software techniques to further improve the performance of linear algebra computations. The performance of the dense matrix by matrix multiplication executed on a super-scalar high performance workstation is improved using binding and nonbinding prefetching to hide the memory latency together with the well known technique of blocking. In this report our main goal is to improve the performance of the matrix by matrix multiplication on a high performance workstation. This widely applied operation has already been improved by means of techniques such as blocking and software pipelining. In this paper we show the improvements yielded by applying software prefetch in addition to those techniques aforementioned. Prefetching is a technique used to hide memory latency by avoiding the processor to stall because of data dependencies when a miss occurs. We rst present an algorithm where only blocking and software pipelining are used, followed by two new algorithms in which binding and nonbinding prefetching respectively have been added. Performance results start with 15 MFlops for jki form. Blocking and software pipelin-ing techniques get a performance of 128 MFlops for the best algorithm without prefetching, and nally, it is improved up to 166 MFlops when prefetching is applied in a computer with 200 MFlops of peak performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving performance of linear algebra algorithms for dense matrices, using algorithmic prefetch

In this paper, we introduce a concept called algorithmic prefetching, for exploiting some of the features of the IBM RISC System/6000@’ computer. Algorithmic prefetching denotes changing algorithm A to algorithm B, which contains additional steps to move data from slower levels of memory to faster levels, with the aim that algorithm B outperform algorithm A. The objective of algorithmic prefetc...

متن کامل

Data Allocation Strategies for Dense Linear Algebra Kernels on Heterogeneous Two-dimensional Grids

We study the implementation of dense linear algebra computations, such as matrix multiplication and linear system solvers, on two-dimensional (2D) grids of heterogeneous processors. For these operations, 2D-grids are the key to scalability and eÆciency. The uniform block-cyclic data distribution scheme commonly used for homogeneous collections of processors limits the performance of these opera...

متن کامل

The Design and Performance of Batched BLAS on Modern High-Performance Computing Systems

A current trend in high-performance computing is to decompose a large linear algebra problem into batches containing thousands of smaller problems, that can be solved independently, before collating the results. To standardize the interface to these routines, the community is developing an extension to the BLAS standard (the batched BLAS), enabling users to perform thousands of small BLAS opera...

متن کامل

Comprehensive Hardware and Software Support for Operating Systems to Exploit MP Memory Hierarchies

High-performance multiprocessor workstations are becoming increasingly popular. Since many of the workloads running on these machines are operating-system intensive, we are interested in what sort of support for the operating system should the memory hierarchy of these machines provide. This paper addresses this question. This paper shows that the largest performance losses for the operating sy...

متن کامل

Comprehensive Hardware and Software Support for Operating Systems to Exploit

ÐHigh-performance multiprocessor workstations are becoming increasingly popular. Since many of the workloads running on these machines are operating-system intensive, we are interested in exploring the types of support for the operating system that the memory hierarchy of these machines should provide. In this paper, we evaluate a comprehensive set of hardware and software supports that minimiz...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995